25 research outputs found

    Towards multi-scale feature detection repeatable over intensity and depth images.

    Get PDF
    Object recognition based on local features computed at multiple locations is robust to occlusions, strong viewpoint changes and object deformations. These features should be repeatable, precise and distinctive. We present an operator for repeatable feature detection on depth images (relative to 3D models) as well as 2D intensity images. The proposed detector is based on estimating the curviness saliency at multiple scales in each kind of image. We also propose quality measures that evaluate the repeatability of the features between depth and intensity images. The experiments show that the proposed detector outperforms both the most powerful, classical point detectors (e.g., SIFT) and edge detection techniques

    Support Vector Machine (SVM) Recognition Approach adapted to Individual and Touching Moths Counting in Trap Images

    Get PDF
    This paper aims at developing an automatic algorithm for moth recognition from trap images in real-world conditions. This method uses our previous work for detection [1] and introduces an adapted classification step. More precisely, SVM classifier is trained with a multi-scale descriptor, Histogram Of Curviness Saliency (HCS). This descriptor is robust to illumination changes and is able to detect and to describe the external and the internal contours of the target insect in multi-scale. The proposed classification method can be trained with a small set of images. Quantitative evaluations show that the proposed method is able to classify insects with higher accuracy (rate of 95.8%) than the state-of-the art approaches

    Towards Recognizing of 3D Models Using A Single Image

    Get PDF
    As 3D data is getting more popular, techniques for retrieving a particular 3D model are necessary. We want to recognize a 3D model from a single photograph; as any user can easily get an image of a model he/she would like to find, requesting by an image is indeed simple and natural. However, a 2D intensity image is relative to viewpoint, texture and lighting condition and thus matching with a 3D geometric model is very challenging. This paper proposes a first step towards matching a 2D image to models, based on features repeatable in 2D images and in depth images (generated from 3D models); we show their independence to textures and lighting. Then, the detected features are matched to recognize 3D models by combining HOG (Histogram Of Gradients) descriptors and repeatability scores. The proposed methods reaches a recognition rate of 72% among 12 3D objects categories, and outperforms classical feature detection techniques for recognizing 3D models using a single image

    Recognizing food places in egocentric photo-streams using multi-scale atrous convolutional networks and self-attention mechanism.

    Get PDF
    Wearable sensors (e.g., lifelogging cameras) represent very useful tools to monitor people's daily habits and lifestyle. Wearable cameras are able to continuously capture different moments of the day of their wearers, their environment, and interactions with objects, people, and places reflecting their personal lifestyle. The food places where people eat, drink, and buy food, such as restaurants, bars, and supermarkets, can directly affect their daily dietary intake and behavior. Consequently, developing an automated monitoring system based on analyzing a person's food habits from daily recorded egocentric photo-streams of the food places can provide valuable means for people to improve their eating habits. This can be done by generating a detailed report of the time spent in specific food places by classifying the captured food place images to different groups. In this paper, we propose a self-attention mechanism with multi-scale atrous convolutional networks to generate discriminative features from image streams to recognize a predetermined set of food place categories. We apply our model on an egocentric food place dataset called 'EgoFoodPlaces' that comprises of 43 392 images captured by 16 individuals using a lifelogging camera. The proposed model achieved an overall classification accuracy of 80% on the 'EgoFoodPlaces' dataset, respectively, outperforming the baseline methods, such as VGG16, ResNet50, and InceptionV3

    Breast tumor segmentation in ultrasound images using contextual-information-aware deep adversarial learning framework.

    Get PDF
    Automatic tumor segmentation in breast ultrasound (BUS) images is still a challenging task because of many sources of uncertainty, such as speckle noise, very low signal-to-noise ratio, shadows that make the anatomical boundaries of tumors ambiguous, as well as the highly variable tumor sizes and shapes. This article proposes an efficient automated method for tumor segmentation in BUS images based on a contextual information-aware conditional generative adversarial learning framework. Specifically, we exploit several enhancements on a deep adversarial learning framework to capture both texture features and contextual dependencies in the BUS images that facilitate beating the challenges mentioned above. First, we adopt atrous convolution (AC) to capture spatial and scale context (i.e., position and size of tumors) to handle very different tumor sizes and shapes. Second, we propose the use of channel attention along with channel weighting (CAW) mechanisms to promote the tumor-relevant features (without extra supervision) and mitigate the effects of artifacts. Third, we propose to integrate the structural similarity index metric (SSIM) and L1-norm in the loss function of the adversarial learning framework to capture the local context information derived from the area surrounding the tumors. We used two BUS image datasets to assess the efficiency of the proposed model. The experimental results show that the proposed model achieves competitive results compared with state-of-the-art segmentation models in terms of Dice and IoU metrics. The source code of the proposed model is publicly available at https://github.com/vivek231/Breast-US-project

    FGR-Net: interpretable fundus image gradeability classification based on deep reconstruction learning

    Get PDF
    The performance of diagnostic Computer-Aided Design (CAD) systems for retinal diseases depends on the quality of the retinal images being screened. Thus, many studies have been developed to evaluate and assess the quality of such retinal images. However, most of them did not investigate the relationship between the accuracy of the developed models and the quality of the visualization of interpretability methods for distinguishing between gradable and non-gradable retinal images. Consequently, this paper presents a novel framework called ‘‘FGR-Net’’ to automatically assess and interpret underlying fundus image quality by merging an autoencoder network with a classifier network. The FGR-Net model also provides an interpretable quality assessment through visualizations. In particular, FGR-Net uses a deep autoencoder to reconstruct the input image in order to extract the visual characteristics of the input fundus images based on self-supervised learning. The extracted features by the autoencoder are then fed into a deep classifier network to distinguish between gradable and ungradable fundus images. FGR-Net is evaluated with different interpretability methods, which indicates that the autoencoder is a key factor in forcing the classifier to focus on the relevant structures of the fundus images, such as the fovea, optic disk, and prominent blood vessels. Additionally, the interpretability methods can provide visual feedback for ophthalmologists to understand how our model evaluates the quality of fundus images. The experimental results showed the superiority of FGR-Net over the state-of-the-art quality assessment methods, with an accuracy of > 89% and an F1-score of > 87%. The code is publicly available at https://github.com/saifalkh/FGR-Net.Instituto de Investigación en Informátic

    Food places classification in egocentric images using Siamese neural networks.

    Get PDF
    Wearable cameras have become more popular in recent years for capturing unscripted moments in the first-person, which help in analysis of the user's lifestyle. In this work, we aim to identify the daily food patterns of a person through recognition of places relating to food in person-focused images ("selfies"). This has the potential for a system that can assist with improvements to eating habits and prevention of diet-related conditions. In this paper, we use Siamese Neural Networks (SNN) to learn similarities between images with one-shot "food places" classification. We tested our proposed method with "MiniEgoFoodPlaces", using 15 food-related locations. The proposed SNN model with MobileNet achieved an overall classification accuracy of 76.74% and 77.53% on the validation and test sets of the "MiniEgoFoodPlaces" dataset, outperforming the base models such as ResNet50, InceptionV3 and InceptionResNetV2

    Breast tumor segmentation and shape classification in mammograms using generative adversarial and convolutional neural network.

    Get PDF
    Mammogram inspection in search of breast tumors is a tough assignment that radiologists must carry out frequently. Therefore, image analysis methods are needed for the detection and delineation of breast tumors, which portray crucial morphological information that will support reliable diagnosis. In this paper, we proposed a conditional Generative Adversarial Network (cGAN) devised to segment a breast tumor within a region of interest (ROI) in a mammogram. The generative network learns to recognize the tumor area and to create the binary mask that outlines it. In turn, the adversarial network learns to distinguish between real (ground truth) and synthetic segmentations, thus enforcing the generative network to create binary masks as realistic as possible. The cGAN works well even when the number of training samples are limited. As a consequence, the proposed method outperforms several state-of-the-art approaches. Our working hypothesis is corroborated by diverse segmentation experiments performed on INbreast and a private in-house dataset. The proposed segmentation model, working on an image crop containing the tumor as well as a significant surrounding area of healthy tissue (loose frame ROI), provides a high Dice coefficient and Intersection over Union (IoU) of 94% and 87%, respectively. In addition, a shape descriptor based on a Convolutional Neural Network (CNN) is proposed to classify the generated masks into four tumor shapes: irregular, lobular, oval and round. The proposed shape descriptor was trained on DDSM, since it provides shape ground truth (while the other two datasets does not), yielding an overall accuracy of 80%, which outperforms the current state-of-the-art
    corecore